31 research outputs found
Fault detection in low voltage networks with smart meters and machine learning techniques
25th International Conference on Electricity Distribution (CIRED 2019), junio, Madrid (Spain)Smart grid data analytics and artificial intelligence techniques are playing an increasingly critical role, becoming the focal point to understanding low voltage real-time grid performance. This new point of view, (advanced analytics in combination with electrical knowledge expertise), makes flexibility and efficiency in electrical grid management approach real. HDCE (Hidrocantábrico Distribución Eléctrica) is the Electrical Distribution System Operator for EdP (Electricity of Portugal) around Spain who supplies energy to 650.000 customers. Starting from 2012, this company has nowadays replaced 99% of traditional meters by smart meters. Based on the analysis of smart metering voltage alarms, recorded from EdP LV distribution network, an automatic learning system has been implemented that groups and orders these alarms helping the grid distribution operator to drive the network technicians to the right and more urgent places where a grid failure is happening, starts to happen or will happen
News Session-Based Recommendations using Deep Neural Networks
News recommender systems are aimed to personalize users experiences and help
them to discover relevant articles from a large and dynamic search space.
Therefore, news domain is a challenging scenario for recommendations, due to
its sparse user profiling, fast growing number of items, accelerated item's
value decay, and users preferences dynamic shift. Some promising results have
been recently achieved by the usage of Deep Learning techniques on Recommender
Systems, specially for item's feature extraction and for session-based
recommendations with Recurrent Neural Networks. In this paper, it is proposed
an instantiation of the CHAMELEON -- a Deep Learning Meta-Architecture for News
Recommender Systems. This architecture is composed of two modules, the first
responsible to learn news articles representations, based on their text and
metadata, and the second module aimed to provide session-based recommendations
using Recurrent Neural Networks. The recommendation task addressed in this work
is next-item prediction for users sessions: "what is the next most likely
article a user might read in a session?" Users sessions context is leveraged by
the architecture to provide additional information in such extreme cold-start
scenario of news recommendation. Users' behavior and item features are both
merged in an hybrid recommendation approach. A temporal offline evaluation
method is also proposed as a complementary contribution, for a more realistic
evaluation of such task, considering dynamic factors that affect global
readership interests like popularity, recency, and seasonality. Experiments
with an extensive number of session-based recommendation methods were performed
and the proposed instantiation of CHAMELEON meta-architecture obtained a
significant relative improvement in top-n accuracy and ranking metrics (10% on
Hit Rate and 13% on MRR) over the best benchmark methods.Comment: Accepted for the Third Workshop on Deep Learning for Recommender
Systems - DLRS 2018, October 02-07, 2018, Vancouver, Canada.
https://recsys.acm.org/recsys18/dlrs
Filtrando atributos para mejorar procesos de aprendizaje
IX Conferencia de la Asociación Española para la Inteligencia Artificial. Gijón, EspañaLos sistemas de aprendizaje automático han sido tradicionalmente usados para extraer conocimiento a partir de conjuntos de ejemplos descritos mediante atributos. Cuando la información de partida representa un problema real no se sabe, generalmente, qué atributos influyen en su resolución. En esos casos, la única opción a priori es utilizar toda la información disponible. Para evitar los problemas que esto conlleva se puede emplear un filtrado de atributos, previo al aprendizaje, que nos permita quedarnos sólo con los atributos más relevantes, aquellos que encierran la solución del problema. En este artículo se describe un método que realiza esta selección. Como se mostrará, está técnica mejora los procesos posteriores de aprendizaj
Automatic plankton quantification using deep features
The study of marine plankton data is vital to monitor the health of the world’s oceans. In recent decades, automatic plankton recognition systems have proved useful to address the vast amount of data collected by specially engineered in situ digital imaging systems. At the beginning, these systems were developed and put into operation using traditional automatic classification techniques, which were fed with handdesigned local image descriptors (such as Fourier features), obtaining quite successful results. In the past few years, there have been many advances in the computer vision community with the rebirth of neural networks. In this paper, we leverage how descriptors computed using Convolutional Neural Networks (CNNs) trained with out-of-domain data are useful to replace hand-designed descriptors in the task of estimating the prevalence of each plankton class in a water sample. To achieve this goal, we have designed a broad set of experiments that show how effective these deep features are when working in combination with state-of-the-art quantification algorithms
Optimizing different loss functions in multilabel classifications
Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then, we use structured output support vector machines tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore, it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise-free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning task
Analysis of nutrition data by means of a matrix factorization method
We present a factorization framework to analyze the data of a regression learning task with two peculiarities. First, inputs can be split into two parts that represent semantically significant entities. Second, the performance of regressors is very low. The basic idea of the approach presented here is to try to learn the ordering relations of the target variable instead of its exact value. Each part of the input is mapped into a common Euclidean space in such a way that the distance in the common space is the representation of the interaction of both parts of the input. The factorization approach obtains reliable models from which it is possible to compute a ranking of the features according to their responsibility in the variation of the target variable. Additionally, the Euclidean representation of data provides a visualization where metric properties have a clear semantics. We illustrate the approach with a case study: the analysis of a dataset about the variations of Body Mass Index for Age of children after a Food Aid Program deployed in poor rural communities in Southern México. In this case, the two parts of inputs are the vectorial representation of children and their diets. In addition to discovering latent information, the mapping of inputs allows us to visualize children and diets in a common metric spac
Utilización de técnicas de Inteligencia Artificial en la clasificación de canales bovinas
En esta comunicación se presenta una aplicación de técnicas de Inteligencia Artificial en la industria alimentaria. Se ha desarrollado una metodología de representación de la conformación de canales bovinas, sintetizándose el conocimiento de los expertos mediante herramientas de Aprendizaje Automático. Los resultados obtenidos demuestran la viabilidad de utilizar clasificadores automáticos, que son capaces de realizar su tarea de manera eficaz con una reducción importante del número de atributos inicial. Este trabajo abre un amplio abanico de posibilidades de aplicación del Aprendizaje Automático en la industria de la alimentació
Learning to assess from pair-wise comparisons
In this paper we present an algorithm for learning a function able to assess objects. We assume that our teachers can provide a collection of pairwise comparisons but encounter certain difficulties in assigning a number to the qualities of the objects considered. This is a typical situation when dealing with food products, where it is very interesting to have repeatable, reliable mechanisms that are as objective as possible to evaluate quality in order to provide markets with products of a uniform quality. The same problem arises when we are trying to learn user preferences in an information retrieval system or in configuring a complex device. The algorithm is implemented using a growing variant of Kohonen’s Self-Organizing Maps (growing neural gas), and is tested with a variety of data sets to demonstrate the capabilities of our approac
Binary relevance efficacy for multilabel classification
The goal of multilabel (ML) classi cation is to induce models able to tag objects with the labels that better describe them. The main baseline for ML classi- cation is Binary Relevance (BR), which is commonly criticized in the literature because of its label independence assumption. Despite this fact, this paper discusses some interesting properties of BR, mainly that it produces optimal models for several ML loss functions. Additionally, we present an analytical study about ML benchmarks datasets, pointing out some shortcomings. As a result, this paper proposes the use of synthetic datasets to better analyze the behavior of ML methods in domains with di erent characteristics. To support this claim, we perform some experiments using synthetic data proving the competitive performance of BR with respect to a more complex method in di cult problems with many labels, a conclusion which was not stated by previous studie